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Abstract 

Multiple hypothesis testing is a fundamental problem in high dimensional 
inference, with wide applications in many scientific fields. In genome-wide as- 
sociation studies, tens of thousands of tests are performed simultaneously to 
find if any SNPs are associated with some traits and those tests are correlated. 
When test statistics are correlated, false discovery control becomes very chal- 
lenging under arbitrary dependence. In the current paper, we propose a novel 
method based on principal factor approximation, which successfully subtracts 
the common dependence and weakens significantly the correlation structure, 
to deal with an arbitrary dependence structure. We derive an approximate 
expression for false discovery proportion (FDP) in large scale multiple testing 
when a common threshold is used and provide a consistent estimate of realized 
FDP. This result has important applications in controlling FDR and FDP. Our 
estimate of realized FDP compares favorably with Efron (2007) 's approach, as 
demonstrated in the simulated examples. Our approach is further illustrated by 
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some real data applications. We also propose a dependence-adjusted procedure, 
which is more powerful than the fixed threshold procedure. 

Keywords: Multiple hypothesis testing, high dimensional inference, false discovery 
rate, arbitrary dependence structure, genome-wide association studies. 

1 Introduction 

Multiple hypothesis testing is a fundamental problem in the modern research for 
high dimensional inference, with wide applications in scientific fields, such as biology, 
medicine, genetics, neuroscience, economics and finance. For example, in genome- 
wide association studies, massive amount of genomic data (e.g. SNPs, eQTLs) are 
collected and tens of thousands of hypotheses are tested simultaneously to find if any 
of these genomic data are associated with some observable traits (e.g. blood pressure, 
weight, some disease); in finance, thousands of tests are performed to see which fund 
managers have winning ability (Barras, Scaillet & Wermers 2010). 

False Discovery Rate (FDR) has been introduced in the celebrated paper by Ben- 
jamini & Hochberg (1995) for large scale multiple testing. By definition, FDR is the 
expected proportion of falsely rejected null hypotheses among all of the rejected null 
hypotheses. The classification of tested hypotheses can be summarized in Table 1: 
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Various testing procedures have been developed for controlling FDR, among which 
there are two major approaches. One is to compare the P-values with a data-driven 
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threshold as in Benjamini & Hochberg (1995). Specifically, let p^) < p( 2 ) < ••• < 
be the ordered observed P-values of p hypotheses. Define k = max j? : p^ < 
ia/p^ and reject H^,--- ,H^, where a is a specified control rate. If no such i 
exists, reject no hypothesis. The other related approach is to fix a threshold value t, 
estimate the FDR, and choose t so that the estimated FDR is no larger than a (Storey 
2002). Finding such a common threshold is based on a conservative estimate of FDR. 
Specifically, let FDR(t) = p t/ (R(t) V 1), where R(t) is the number of total discoveries 
with the threshold t and p is an estimate of p . Then solve t such that FDR(t) < a. 
The equivalence between the two methods has been theoretically studied by Storey, 
Taylor & Siegmund (2004) and Ferreira & Zwinderman (2006). 

Both procedures have been shown to control the FDR for independent test statis- 
tics. However, in practice, test statistics are usually correlated. Although Benjamini 
& Yekutieli (2001) and Clarke & Hall (2009) argued that when the null distribution 
of test statistics satisfies some conditions, dependence case in the multiple testing 
is asymptotically the same as independence case, multiple testing under general de- 
pendence structures is still a very challenging and important open problem. Efron 
(2007) pioneered the work in the field and noted that correlation must be accounted 
for in deciding which null hypotheses are significant because the accuracy of false dis- 
covery rate techniques will be compromised in high correlation situations. There are 
several papers that show the Benjamini-Hochberg procedure or Storey's procedure 
can control FDR under some special dependence structures, e.g. Positive Regression 
Dependence on Subsets (Benjamini & Yekutieli 2001) and weak dependence (Storey, 
Taylor & Siegmund 2004). Sarkar (2002) also shows that FDR can be controlled by a 
generalized stepwise multiple testing procedure under positive regression dependence 
on subsets. However, even if the procedures are valid under these special dependence 
structures, they will still suffer from efficiency loss without considering the actual de- 
pendence information. In other words, there are universal upper bounds for a given 
class of covariance matrices. 
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In the current paper, we will develop a procedure for high dimensional multiple 
testing which can deal with any arbitrary dependence structure and fully incorpo- 
rate the covariance information. This is in contrast with previous literatures which 
consider multiple testing under special dependence structures, e.g. Sun & Cai (2009) 
developed a multiple testing procedure where parameters underlying test statistics 
follow a hidden Markov model, and Leek & Storey (2008) and Friguet, Kloareg & 
Causeur (2009) studied multiple testing under the factor models. More specifically, 
consider the test statistics 

(Z ir .. ,Z p ) T ^N((^,... ,/i p ) T ,S), 

where S is known and p is large. We would like to simultaneously test : /ij = 
vs Hu : ^ ^ for % = 1, • • • ,p. Note that S can be any non- negative definite 
matrix. Our procedure is called Principal Factor Approximation (PFA). The basic 
idea is to first take out the principal factors that derive the strong dependence among 
observed data Z±, • • ■ , Z p and to account for such dependence in calculation of false 
discovery proportion (FDP). This is accomplished by the spectral decomposition of £ 
and taking out the largest common factors so that the remaining dependence is weak. 
We then derive the asymptotic expression of the FDP, defined as V/ R, that accounts 
for the strong dependence. The realized but unobserved principal factors that derive 
the strong dependence are then consistently estimated. The estimate of the realized 
FDP is obtained by substituting the consistent estimate of the unobserved principal 
factors. 

We are especially interested in estimating FDP under the high dimensional sparse 
problem, that is, p is very large, but the number of jii ^ is very small. In section 
2, we will explain the situation under which £ is known. Sections 3 and 4 present 
the theoretical results and the proposed procedures. In section 5, the performance 
of our procedures is critically evaluated by various simulation studies. Section 6 is 
about the real data analysis. All the proofs are relegated to the Appendix and the 
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Supplemental Material. 



2 Motivation of the Study 

In genome-wide association studies, consider p SNP genotype data for n individual 
samples, and further suppose that a response of interest (i.e. gene expression level 
or a measure of phenotype such as blood pressure or weight) is recorded for each 
sample. The SNP data are conventionally stored in an n x p matrix X = (:%), with 
rows corresponding to individual samples and columns corresponding to individual 
SNPs . The total number n of samples is in the order of hundreds, and the number 
p of SNPs is in the order of tens of thousands. 

Let Xj and Y denote, respectively, the random variables that correspond to the 
jth SNP coding and the outcome. The biological question of the association between 
genotype and phenotype can be restated as a problem in multiple hypothesis testing, 
i.e., the simultaneous tests for each SNP j of the null hypothesis Hj of no association 
between the SNP Xj and Y. Let {Xj}™ =1 be the sample data of Xj, {Y*}^ be the 
independent sample random variables of Y. Consider the marginal linear regression 
between {F*}™ =1 and {X]}™ =1 : 

1 n 

( aj , (3j) = argmin a , b . - ^ E{Y l - aj - bjX*) 2 , j = 1, • • • , p. (1) 

i=l 

We wish to simultaneously test the hypotheses 

H 0j : pj = vs H ir . fr^O, j = l,...,p (2) 

to see which SNPs are correlated with the phenotype. 

Recently statisticians have increasing interests in the high dimensional sparse 
problem: although the number of hypotheses to be tested is large, the number of 
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false nulls (j3j ^ 0) is very small. For example, among 2000 SNPs, there are maybe 
only 10 SNPs which contribute to the variation in phenotypes or certain gene ex- 
pression level. Our purpose is to find these 10 SNPs by multiple testing with some 
statistical accuracy. 

Because of the sample correlations among {Xfi'^'jlf , the least-squares estimators 
{/3j}*j =1 for {Pj}? =l in (1) are also correlated. The following result describes the joint 
distribution of {/3j}^ =v The proof is straightforward. 

Proposition 1. Let f3j be the least-squares estimator for (3j in (1) based on n data 
points, sm be the sample correlation between Xk and X\. Assume that the conditional 
distribution of Y % given X[, • • • , X % p is N(fi(X{, • • • , X*), a 2 ). Then, conditioning on 
{X^'jZl the joint distribution of {&}? =1 is U ■ ■ ■ , %) T ~ N((fa, ■ ■ ■ ,/3 P f ,£*), 
where the (k,l)th element in S* is Y,* kl = a 2 Ski/ (nskkSu) ■ 

For ease of notation, let Z ± ,--- ,Z p be the standardized random variables of 
ft,-- - ,/3 p , that is, 

ry Pi Pi -1 / \ 

t = !,-■■ ,p. (3) 



SD(A) v/iVnSid' 

In the above, we implicitly assume that a is known and the above standardized 
random variables are z-test statistics. The estimate of residual variance a 2 will be 
discussed in Section 6 via refitted cross-validation (Fan, Guo & Hao, 2011). Then, 
conditioning on {Xj}, 

(Z,,-.. ,Z p ) T -Nd^,..- ,/i p ) T ,S), (4) 

where \ii = y/n/3iSu/a and covariance matrix £ has the (k,l)th element as sm- Si- 
multaneously testing (2) based on • • • , (3 P ) T is thus equivalent to testing 

H 0j : Hj = vs H 1:j : fij ^ 0, j = 1, • • • ,p (5) 
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based on (Z%, • • • , Z P ) T . 

In (4), S is the population covariance matrix of (Z\, • • ■ , Z p ) , and is known based 
on the sample data {-Xj}- The covariance matrix S can have arbitrary dependence 
structure. We would like to clarify that S is known and there is no estimation of the 
covariance matrix of Xi, ■ ■ ■ , X p in this set up. 

3 Estimating False Discovery Proportion 

From now on assume that among all the p null hypotheses, po of them are true and 
Pi hypotheses (pi = p — Po) are false, and p\ is supposed to be very small compared 
to p. For ease of presentation, we let p be the sole asymptotic parameter, and assume 
Po — > oo when p — > oo. For a fixed rejection threshold t, we will reject those P-values 
no greater than t and regard them as statistically significance. Because of its powerful 
applicability, this procedure has been widely adopted; see, e.g., Storey (2002), Efron 
(2007, 2010), among others. Our goal is to estimate the realized FDP for a given t in 
multiple testing problem (|5| based on the observations Q under arbitrary dependence 
structure of S. Our methods and results have direct implications on the situation 
in which S is unknown, the setting studied by Efron (2007, 2010) and Friguet et al 
(2009). In the latter case, S needs to be estimated with certain accuracy. 

3.1 Approximation of FDP 

Define the following empirical processes: 

V(t) = #{true null Pi : P { < t}, 

S(t) = #{false null P { : P, < t} and 

R(t) = #{Pi:P,<t}, 
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where t e [0,1]. Then, V(t), S(t) and R(t) are the number of false discoveries, 
the number of true discoveries, and the number of total discoveries, respectively. 
Obviously, R(t) = V(t) + S(t), and V(t), S(t) and R(t) are all random variables, 
depending on the test statistics (Zi, ■ ■ ■ , Z P ) T . Moreover, R(t) is observed in an 
experiment, but V(t) and S(t) are both unobserved. 

By definition, FDP(t) = V(t)/R(t) and FDR(t) = E V(t)/R(t) . One of interests 
is to control FDR(t) at a predetermined rate a, say 15%. There are also substantial 
research interests in the statistical behavior of the number of false discoveries V(t) 
and the false discovery proportion FDP(t), which are unknown but realized given an 
experiment. One may even argue that controlling FDP is more relevant, since it is 
directly related to the current experiment. 

We now approximate V(t)/R(t) for the high dimensional sparse case p\ <C p. 
Suppose (Zi, ■ ■ ■ , Z P ) T ~ JV((//i, • • • , fi p ) T , S) in which S is a correlation matrix. We 
need the following definition for weakly dependent normal random variables; other 
definitions can be found in Farcomeni (2007). 

Definition 1. Suppose (K 1 , ■ • • , K p ) T ~ N((9 1 , ■ ■ ■ , P ) T , A). Then K ± , ■ ■ ■ ,K p are 

called weakly dependent normal variables if 



lim p 2 y^\a ij \ = 0, (6) 

where denote the (i,j)th element of the covariance matrix A. 

Our procedure is called principal factor approximation (PFA). The basic idea is 
to decompose any dependent normal random vector as a factor model with weakly 
dependent normal random errors. The details are shown as follows. Firstly apply 
the spectral decomposition to the covariance matrix S. Suppose the eigenvalues of 
S are Ai, • • • , X p , which have been arranged in decreasing order. If the corresponding 
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orthonormal eigenvectors are denoted as 7 1; • • • , 7 , then 



E = j> 7 <7r (7) 

i=i 

If we further denote A = Y7i=k+i ^ililJ f° r an integer k, then 

l|A||^ = A2 +1 + ... + Aj, (8) 

where || ■ \\p is the Frobenius norm. Let L = (a/Ai7 1; ■ ■ ■ , V%-7fc), which is a p x 
matrix. Then the covariance matrix S can be expressed as 

£ = LL T + A, (9) 

and Z 1 ,--- , Z p can be written as 

Z i = iH + bTW + K i , i = l,---,p, (10) 

where b; = (6 il; • • • , 6 ifc ) T , (b lj7 ■■■ , b pj ) T = ^/Xjjj, the factors are W = (W 1 , ■ ■ ■ , W k f 
~ Nk(0,Ik) an d the random errors are (Ki,--- ,K p ) T ~ N(0,A). Furthermore, 
W'lj ■ ■ ■ , are independent of each other and independent of K\, • ■ • , K p . Chang- 



ing a probability if necessary, we can assume (10) is the data generation process. In 



expression (10), {/ij = 0} correspond to the true null hypotheses, while {fa 7^ 0} 



correspond to the false ones. Note that although (10) is not exactly a classical mul- 
tifactor model because of the existence of dependence among K±,-- - ,K P , we can 
nevertheless show that (Ki, • • • , K p ) T is a weakly dependent vector if the number of 
factors k is appropriately chosen. 

We now discuss how to choose k such that (Ki, ■ ■ ■ ,K P ) T is weakly dependent. 
Denote by the (i,j)th element in the covariance matrix A. If we have 



V 



-\\ 2 k+1 + ■■■ + \ 2 p ) 1/2 — ► as p -+ 00, (11) 
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then by the Cauchy-Schwartz inequality 



p~ 2 M < P _1 H a IIf = P~\^k+i + • • • + Ap) 1/2 — > as p oo. 



h3 



Note that £f =1 Aj = ^"(53) = p, so that (11) is self-normalized. Note also that 



the left hand side of (11) is bounded by p _1 / 2 Afc+i which tends to zero whenever 
Xk+i = o(p 1 / 2 ). Therefore, we can assume that the A& > cp 1 ' 2 for some c > 0. In 
particular, if Ai = o(p 1//2 ), the matrix S is weak dependent and k — 0. In practice, 
we always choose the smallest k such that 



A| +1 H h A 2 



x \ < £ 

Ai H + A p 

holds for a predetermined small e, say, 0.01. 

Theorem 1. Suppose (Zi, ■ ■ ■ , Z P ) T ~ N((fii, ■ ■ ■ , /x p ) T , 53). Choose an appropriate 
k such that 



(CO) 



Ala + --- + A 2 
Ai H h A p 



0(p" 5 ) /or <5>0. 



Le£ vA/7,- = (fry, • • • , 6 PJ ) T for j = !,-■■ , fc. TTien, 



lim (fDP(*)- 





${ai(z t / 2 + rji)) + $(aj(z t/2 - 








*(aj(^/2 + Vi + Vi)) + ®{ai{zt/2 - 7)i 





}=0 a.s., (12) 



w/ierea; = (l-££=i 6 iJ -1/2 > ^ = bf W m& b, = (6 a , • • • M) T and W ~ N k (0, h 



in (10), and $(-) and 2 t / 2 = $ 1 {t/2) are i/ie cumulative distribution function and 



the t/2 lower quantile of a standard normal distribution, respectively. 



Note that condition (CO) implies that Ki, ■ ■ ■ , K p are weakly dependent random 



variables, but (11) converges to zero at some polynomial rate of p. 



Theorem 1 gives an asymptotic approximation for FDP(t) under general depen- 
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dence structure. To the best of our knowledge, it is the first result to explicitly spell 
out the impact of dependence. It is also closely connected with the existing results 
for independence case and weak dependence case. Let = for % = 1, • • • ,p and 



h = 1, • • • ,k in (10) and Ki, ■ ■ ■ ,K P be weakly dependent or independent normal 



random variables, then it reduces to the weak dependence case or independence case, 



respectively. In the above two specific cases, the numerator of (12) is just p t. Storey 
(2002) used an estimate for p , resulting an estimator of FDP(t) as pot/R(t). This 
estimator has been shown to control the false discovery rate under independency and 
weak dependency. However, for general dependency, Storey's procedure will not work 
well because it ignores the correlation effect among the test statistics, as shown by 



(12). Further discussions for the relationship between our result and the other leading 



research for multiple testing under dependence are shown in Section 3.4. 

The results in Theorem 1 can be better understood by some special dependence 
structures as follows. These specific cases are also considered by Roquain & Villers 
(2010), Finner, Dickhaus & Roters (2007) and Friguet, Kloareg & Causeur (2009) 
under somewhat different settings. 

Example 1: [Equal Correlation] If £ has pij = p E [0, 1) for i j, then we 
can write 

Zi = Pi + ^fpW + a/1 - pKi i = 1, ■ ■ ■ ,p 

where W ~ N(0, 1), Ki ~ N(0, 1), and W and all K^s are independent of each other. 
Thus, we have 



lim 

p— >oo 



FDP(i) 



Po 


${d{z t/2 + y/pW)) + ${d{Zt/2 ~ y/pW)) 




Z-/i=l 


Hd(z t/2 + JpW + Pi)) + <5>{d{z t/2 - JpW 


-Hi)) 



a.s., 



where d = (1 — p) 1 ^ 2 . 

Note that Delattre & Roquain (2011) studied the FDP in a particular case of 
equal correlation. They provided a slightly different decomposition of {Z,j} p i=l in the 
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proof of Lemma 3.3 where the errors K^s have a sum equal to 0. Finner, Dickhaus 
& Roters (2007) in their Theorem 2.1 also shows a result similar to Theorem 1 for 
equal correlation case. 

Example 2: [Multifactor Model] Consider a multifactor model: 

Zi = Hi + rji + a~ l K h i = 1, • • • , p, (13) 

where rji and a, are defined in Theorem 1 and ~ N(0, 1) for % = 1, • • • ,p. All the 
WVs and K^s are independent of each other. In this model, W%, • • • , Wk are the fc 



common factors. By Theorem 1, expression (12) holds 



Note that the covariance matrix for model (13) is 



£ = LI/ + diag(a7/, - • ■ ,a~ 2 ). 

When {cij} is not a constant, columns of L are not necessarily eigenvectors of S. 
In other words, when the principal component analysis is used, the decomposition 
of @ can yield a different L and condition (lll|) can require a different value of A;. 



In this sense, there is a subtle difference between our approach and that in Friguet, 
Kloareg & Causeur (2009) when the principal component analysis is used. Theorem 
1 should be understood as a result for any decomposition (|9| that satisfies condition 
(CO). Because we use principal components as approximated factors, our procedure is 
called principal factor approximation. In practice, if one knows that the test statistics 
comes from a factor model structure, multiple testing procedure based on this factor 
model should be preferable. However, when such factor structure is not clear, our 
procedure can deal with an arbitrary covariance dependence case. 

In Theorem 1, since FDP is bounded by 1, taking expectation on both sides of 



the equation (12) and by the Portmanteau lemma, we have the convergence of FDR: 
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Corollary 1. Under the assumptions in Theorem 1, 



lim <^ FDR(t) - E 

p— >oo 



d{true null} 



{$(ai(z t/2 + rji)) + ${ai{z t / 2 - 774))} 



Ya=i {^(o»(^/a + Vi + Mi)) + ^(oi(^/2 - »/i - A*.))} 



(14) 



The expectation on the right hand side of ( 14 ) is with respect to standard multi- 
variate normal variables (Wi, • • • , W k ) T ~ N k (0, I k ). 



The proof of Theorem 1 is based on the following result. 



Proposition 2. Under the assumptions in Theorem 1, 



lim 

p— >oo . 



p 1 R{t) - p 1 [$0i0t/2 + Vi + AO) + $0i0t/2 - »7i - Mi)) = a.s., (15) 



i=i 



lim 

p— >oo 



PoV^-Po 1 [^(oi(^/2 + »7t)) +$(a»(*t/2 -^))JJ = a.s..(16) 

i£{true null} 



The proofs of Theorem 1 and Proposition 2 are shown in the Appendix. 



3.2 Estimating Realized FDP 

In Theorem 1 and Proposition 2, the summation over the set of true null hypotheses 
is unknown. However, due to the high dimensionality and sparsity, both p and po are 
large and p\ is relatively small. Therefore, we can use 



Y h(ai(z t/2 + Vi)) + $(ai(*t/2 - Vi)) 



(17) 



i=i 



as a conservative surrogate for 



Y \$(ai(z t/2 + rji)) + Hoi(zt/ 2 - Vi)) 

i£ {true null} 
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Since only p\ extra terms are included in (17), the substitution is accurate enough 
for many applications. 

Recall that FDP(t) = V(t)/R(t), in which R(t) is observable and known. Thus, 
only the realization of V(t) is unknown. The mean of V(t) is E ^ie{true null) — 
t) = Pot-, since the P- values corresponding to the true null hypotheses are uniformly 
distributed. However, the dependence structure affect the variance of V(t), which can 
be much larger than the binomial formula pot(l — t). Owen (2005) has theoretically 
studied the variance of the number of false discoveries. In our framework, expression 



(17) is a function of i.i.d. standard normal variables. Given t, the variance of (17) 
can be obtained by simulations and hence variance of V(t) is approximated via (17). 
Relevant simulation studies will be presented in Section 5. 

In recent years, there have been substantial interests in the realized random vari- 
able FDP itself in a given experiment, instead of controlling FDR, as we are usually 
concerned about the number of false discoveries given the observed sample of test 
statistics, rather than an average of FDP for hypothetical replications of the ex- 
periment. See Genovese & Wasserman (2004), Meinshausen (2005), Efron (2007), 
Friguet et al (2009), etc. In our problem, by Proposition 2 it is known that V(t) is 
approximately 

maiizt/2 + rji)) + ®(ai(zt/2 ~ Vi)) • (19) 



Let 



FDP A (t) = (J2 [$(<H(zt/2 + rh)) + $(0^/2 -%))]) /#(*), 

i=l 

if R(t) 7^ and FDPa(^) = when R(t) = 0. Given observations Z\, ■ ■ ■ ,z p of the 
test statistics Z±, • • • , Z p , if the unobserved but realized factors W±, • • • , Wk can be 
estimated by Wi, ■ ■ • , Wk, then we can obtain an estimator of FDPa(^) by 

p 

FDP(t) =min(^ [$(0^/2 + Vi)) + $(ai(*/2 - ft))] , R{t))/R{t), (20) 

i=i 
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when R(t) ^ and FDP(t) = when R(t) = 0. Note that in (|20j), % = YX=i h ihWh 
is an estimator for rfc = b^W. 

The following procedure is one practical way to estimate W = (W\, ■ ■ ■ , Wk) T 
based on the data. For observed values Zx, ■ ■ ■ ,z p , we choose the smallest 90% of 
l^il's, say. For ease of notation, assume the first m Zi's have the smallest absolute 
values. Then approximately 

Z i = bfW + K ii i = l,-,m. (21) 



The approximation from (10) to (21 ) stems from the intuition that large |/^|'s tend to 



produce large |zj|'s as Z$ ~ 1) 1 < i < p and the sparsity makes approximation 



errors negligible. Finally we apply the robust Li-regression to the equation set (21) 



and obtain the least-absolute deviation estimates Wi, ■ • • , Wk- We use Li-regression 



rather than /^-regression because there might be nonzero //j involved in (21) and 
Li is more robust to the outliers than L 2 . Other possible methods include using 
penalized method such as LASSO or SCAD to explore the sparsity. For example, one 
can minimize 

j2(z l -^-hjwy + j2px(\^\) 

i=i i=i 

with respect to {yUj}f =1 and W, where px(-) is a folded-concave penalty function (Fan 
and Li, 2001). 



The estimator (20) performs significantly better than Efron (2007) 's estimator in 
our simulation studies. One difference is that in our setting S is known. The other 
is that we give a better approximation as shown in Section 3.4. 

Efron (2007) proposed the concept of conditional FDR. Consider E(V(t))/R(t) 
as one type of FDR definitions (see Efron (2007) expression (46)). The numerator 
E(V(t)) is over replications of the experiment, and equals a constant pot. But if the 
actual correlation structure in a given experiment is taken into consideration, then 
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Efron (2007) defines the conditional FDR as E(V (t)\A) / R(t) where A is a random 
variable which measures the dependency information of the test statistics. Estimating 
the realized value of A in a given experiment by A, one can have the estimated con- 
ditional FDR as E{V{t)\A)/R{t). Following Efron's proposition, Friguet et al (2009) 
gave the estimated conditional FDR by E(V (t)\W) / R(t) where W is an estimate of 
the realized random factors W in a given experiment. 



Our estimator in (20) for the realized FDP in a given experiment can be under- 



stood as an estimate of conditional FDR. Note that (18) is actually E(V(t)\{rji}? =l ). 
By Proposition 2, we can approximate V(t) by E{V{t)\{r]i}^ =l ). Thus the estimate 
of conditional FDR E(V {t)\{rji} p i=l ) / R{t) is directly an estimate of the realized FDP 
V(t)/R(t) in a given experiment. 

3.3 Asymptotic Justification 

Let w = (wi, ■ ■ ■ , Wk) T be the realized values of {W)i}^ =1 , and w be an estimator for 



w. We now show in Theorem 2 that FDP(t) in (20) based on a consistent estimator 



w has the same convergence rate as w under some mild conditions. 
Theorem 2. If the following conditions are satisfied: 

(CI) R(t)/p > H for H > as p -> oo ; 

(C2) mini<j< p min(|z t / 2 + bfw|, \z t/2 - bjw\) >r>0, 

(C3) || w — w|| 2 = O p (p~ r ) for some r > 0, 

then |FDP(t) - FDP A (*)| = 0(||w - w|| 2 ). 

In Theorem 2, (C2) is a reasonable condition because z t / 2 is a large negative 
number when threshold t is small and hf w is a realization from a normal distribution 
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mo, ELi t>l) with Yl =1 < 1- Thus z t / 2 + bf w or z t / 2 — bf w is unlikely close to 
zero. 

Theorem 3 shows the asymptotic consistency of L\— regression estimators under 



model (21). Portnoy (1984b) has proven the asymptotic consistency for robust re- 
gression estimation when the random errors are i.i.d. However, his proof does not 
work here because of the weak dependence of random errors. Our result allows k to 
grow with m, even at a faster rate of o(m 1/ ' 4 ) imposed by Portnoy (1984b). 



Theorem 3. Suppose (21) is a correct model. Let w be the L\— regression estimator: 

m 

w = argming eRfc \Z t - bj(3\ (22) 



i=l 



where bj = (bn, ■ ■ ■ , bik) T ■ Let w = (wi, • • • , Wk) T be the realized values of {Wh\h=\- 
Suppose k = 0(m K ) for < k < 1 — 8. Under the assumptions 

(C4) Y J U + i^<Vforr 1 = 0{m^), 
(C5) 

m 

lim sup mT 1 7 7(|bfu| < d) — 



m^oo j| U || =1 



i=l 



for a constant d > 0. 

( C6 ) a max / a m i n < S for some constant S when m — > 00 where l/a^ is the standard 
deviation of K i; 

(CI) a min = 0(m( 1 - K )/ 2 ). 



We have ||w — w|| 2 = O p yy — / 



(C4) is stronger than (CO) in Theorem 1 as (CO) only requires Y^=k+i = 
0{p 2 ~ 2& ). (C5) ensures the identifiability of f3, which is similar to Proposition 3.3 in 
Portnoy (1984a). (C6) and (C7) are imposed to facilitate the technical proof. 
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We now briefly discuss the role of the factor k. To make the approximation in 
Theorem 1 hold, we need k to be large. On the other hand, to make the realized factors 
estimable with reasonably accuracy, we hope to choose a small k as demonstrated in 
Theorem 3. Thus, the practical choice of k should be done with care. 

Since m is chosen as a certain large proportion of p, combination of Theorem 2 and 
Theorem 3 thus shows the asymptotic consistency of FDP(t) based on L\— regression 



estimator of w = (wi, ■ ■ ■ , Wk) in model (21 ) 



|FDP(t)-FDP A (t)| = O p (J-; 

V m 



The result in Theorem 3 are based on the assumption that (21 ) is a correct model. 



In the following, we will show that even if (21) is not a correct model, the effects of 
misspecification are negligible when p is sufficiently large. To facilitate the mathe- 
matical derivations, we instead consider the least-squares estimator. Suppose we are 



estimating W = (W\, ■ ■ ■ , Wk) T from (10). Let X be the design matrix of model 
(10). Then the least-squares estimator for W is W LS = W + (X T X)- 1 X T (/z + K), 
where /x = (/zi, • • • , fj, p ) T and K = (Ki, • • • , K P ) T . Instead, we estimate W\, ■ ■ ■ , Wk 



based on the simplified model (21), which ignores sparse {/z;}. Then the least-squares 
estimator for W is W LS = W+ (X T X) _1 X T K = W, in which we utilize the orthog- 
onality between X and var(K). The following result shows that the effect of misspec- 



ification in model (21) is negligible when p — >■ oo, and consistency of the least-squares 
estimator. 

Theorem 4. The bias due to ignoring non-nulls is controlled by 

k 

||W LS - W|| 2 = ||W LS - W* s || 2 < ||ax|| 2 (^ A,- 1 



1/2 



i=l 



In Theorem 1, we can choose appropriate k such that > cp 1//2 as noted proceed- 
ing Theorem 1. Therefore, J^ i=1 Aj" 1 — > as p — > oo is a reasonable condition. When 



{/i«}f =1 are truly sparse, it is expected that \\nW2 grows slowly or is even bounded so 
that the bound in Theorem 4 is small. For L\— regression, it is expected to be even 
more robust to the outliers in the sparse vector {/ij}f =1 . 



3.4 Dependence- Adjusted Procedure 

A problem of the method used so far is that the ranking of statistical significance is 
completely determined by the ranking of the test statistics {|^|}- This is undesirable 
and can be inefficient for the dependent case: the correlation structure should also 
be taken into account. We now show how to use the correlation structure to improve 
the signal to noise ratio. 



Note that by (10), Z { - bfW ~ iV(/i;,ar 2 ) where <2j is defined in Theorem 1. 
Since a" 1 < 1, the signal to noise ratio increases, which makes the resulting pro- 
cedure more powerful. Thus, if we know the true values of the common factors 
W = (Wx, ■ ■ ■ , Wk) T ', we can use a^Zi — hJW) as the test statistics. The dependence- 
adjusted p-values 2$(— |oj(Zj — bfW)|) can then be used. Note that this testing 
procedure has different thresholds for different hypotheses based on the magnitude of 
Zi, and has incorporated the correlation information among hypotheses. In practice, 
given Zi, the common factors {Wft}* =1 are realized but unobservable. As shown in 
section 3.2, they can be estimated. The dependence adjusted p-values are then given 
by 

2<t>(-\a l (Z l -bJW)\) (23) 

for ranking the hypotheses where W = (W\, • • • , Wk) T is an estimate of the principal 
factors. We will show in section 5 by simulation studies that this dependence-adjusted 
procedure is more powerful. The "factor adjusted multiple testing procedure" in 
Friguet et al (2009) shares a similar idea. 
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3.5 Relation with Other Methods 



Efron (2007) proposed a novel parametric model for V(t): 



V{t) = Po t 



2A 



'-Z t / 2 )<P{Zt/2 

V2t 



(24) 



where A ~ N(0, a 2 ) for some real number a and </>(•) stands for the probability density 
function of the standard normal distribution. The correlation effect is explained by 
the dispersion variate A. His procedure is to estimate A from the data and use 



Pot 



1 + 2A- 



-Z t / 2 )(j)(z t /2 

y/2t 



R(t) 



(25) 



as an estimator for realized FDP(t). Note that the above expressions are adapta- 
tions from his procedure for the one-sided test to our two-sided test setting. In his 
simulation, the above estimator captures the general trend of the FDP, but it is not 
accurate and deviates from the true FDP with large amount of variability. Consider 



our estimator FDP(i) in (20). Write % = a { Qi where Qi ~ iV(0, 1). When a { — > for 
V? G {true null}, by the second order Taylor expansion, 



FDP(t) 



Pot 
R(t)l 



1+ E 

i£ {true null} 



1 , (-Zt/2)(p(z t/2 ) 

1 Pot 



By comparison with Efron's estimator, we can see that 



.4 



V2Po, 



i£{true null} 



Thus, our method is more general. 



(26) 



(27) 



Leek & Storey (2008) considered a general framework for modeling the depen- 
dence in multiple testing. Their idea is to model the dependence via a factor model 
and reduces the multiple testing problem from dependence to independence case via 
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accounting the effects of common factors. They also provided a method of estimating 
the common factors. In contrast, our problem is different from Leek & Storey's and 
we estimate common factors from principal factor approximation and other methods. 
In addition, we provide the approximated FDP formula and its consistent estimate. 

Friguet, Kloareg & Causeur (2009) followed closely the framework of Leek & 
Storey (2008). They assumed that the data come directly from a multifactor model 
with independent random errors, and then used the EM algorithm to estimate all the 
parameters in the model and obtained an estimator for FDP(t). In particular, they 



subtract rfc out of (13) based on their estimate from the EM algorithm to improve the 
efficiency However, the ratio of estimated number of factors to the true number of 
factors in their studies varies according to the dependence structures by their EM al- 
gorithm, thus leading to inaccurate estimated FDP(t). Moreover, it is hard to derive 
theoretical results based on the estimator from their EM algorithm. Compared with 
their results, our procedure does not assume any specific dependence structure of the 
test statistics. What we do is to decompose the test statistics into an approximate 
factor model with weakly dependent errors, derive the factor loadings and estimate 
the unobserved but realized factors by Li-regression. Since the theoretical distribu- 



tion of V(t) is known, estimator (20) performs well based on a good estimation for 
Wi,-.. ,W k . 



4 Approximate Estimation of FDR 

In this section we propose some ideas that can asymptotically control the FDR, not 
the FDP, under arbitrary dependency. Although their validity is yet to be established, 
promising results reveal in the simulation studies. Therefore, they are worth some 
discussion and serve as a direction of our future work. 

Suppose that the number of false null hypotheses pi is known. If the signal \Xi for 
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i G {false null} is strong enough such that 



®\a*{zt/2 + Vi + A**)) +&(<k(zt/2 -Vi-fr 
then asymptotically the FDR is approximately given by 

FDR(t) = E 



(2f 





=1 


$(Oi(zt/2 + Vi)) + ®( a i{ Z t/2 - Vi)) 






$ 


{ai(z t / 2 + Vt)) + ®{ai(z t /2 - Vi)) 


+ Pi 



(29) 



which is the expectation of a function of W±, • • • , Wk- Note that FDR(t) is a known 
function and can be computed by Monte Carlo simulation. For any predetermined 
error rate a, we can use the bisection method to solve t so that FDR(t) = a. Since k 
is not large, the Monte Carlo computation is sufficiently fast for most applications. 



The requirement (28) is not very strong. First of all, $(3) ~ 0.9987, so (28) will 
hold if any number inside the $(•) is greater than 3. Secondly, 1 — Ylh=i ^ih ^ s usua Uy 
very small. For example, if it is 0.01, then a« = (1 — Xlh=i ^k) ~ 10, which means 
that if either z t /2 + Vi + A*i or z t/2 



Vi — ^ exceed 0.3, then (28) is approximately 



satisfied. Since the effect of sample size n is involved in the problem in Section 2, 



(28) is not a very strong condition on the signal strength 



Note that Finner et al (2007) considered a "Dirac uniform model" , where the p- 
values corresponding to a false hypothesis are exactly equal to 0. This model might 



be potentially useful for FDR control. The calculation of (29) requires the knowledge 
of the proportion px of signal in the data. Since p\ is usually unknown in practice, 
there is also future research interest in estimating p\ under arbitrary dependency. 
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5 Simulation Studies 



In the simulation studies, we consider p = 2000, n = 100, a — 2, the number of false 
null hypotheses p\ = 10 and the nonzero = 1, unless stated otherwise. We will 
present 6 different dependence structures for S of the test statistics (Z ± , • • • , Z P ) T ~ 
iV((/xi, • • • ,/i p ) T , £). Following the setting in section 2, £ is the correlation matrix 
of a random sample of size n of dimensional vector X, = (Xn,--- ,X ip ), and 
fij = y/n(3jdj/<r, j — 1, • • • ,p. The data generating process vector Xj's are as follows. 

• [Equal correlation] Let X T = (X 1: ■ ■ ■ , X P ) T ~ iV p (0, S) where £ has diago- 
nal element 1 and off-diagonal element 1/2. 

• [Fan & Song's model] For X = (X u ■ ■ ■ ,X p ), let {X k }^ be i.i.d. iV(0, 1) 
and 

10 / — To 

X k = Y^X l {-l) l + 1 /h + ^l--e k , fc = 1901,-.. ,2000, 

where {efc}fc=i9oi are standard normally distributed. 

• [Independent Cauchy] For X = (X u ■ ■ ■ ,X P ), let {X k }f°° be i.i.d. Cauchy 
random variables with location parameter and scale parameter 1. 

• [Three factor model] For X = (X i: ■ ■ ■ ,X P ), let 

X, = pfwW + pfw™ + pfw® + H v 

where ~ N(-2, 1), ~ N(l, 1), ~ iV(4, 1), pf\pf\pf are i.i.d. 
C/(-l, 1), and Hj are i.i.d. JV(0, 1). 

• [Two factor model] For X = (X u ■ ■ ■ , X p ), let 

X 3 =pfw^+pfw^ + H 3 , 
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where and are i.i.d. N(0, 1), pf and pf are i.i.d. U{-1, 1), and Hj 
are i.i.d. iV(0,l). 

• [Nonlinear factor model] For X = (Xx, ■ ■ ■ ,X P ), let 

X 3 = sm(pf\V w ) + sgn(pf) expflpf \W&) + Hj, 

where W^ 1 ) and W (2) are i.i.d. iV(0, 1), pf and pf ) are i.i.d. U{-1, 1), and 
are i.i.d. N(0,1). 

Fan & Song's Model has been considered in Fan & Song (2010) for high dimen- 
sional variable selection. This model is close to the independent case but has some 
special dependence structure. Note that although we have used the term "factor 
model" above to describe the dependence structure, it is not the factor model for the 
test statistics Zi, ■ ■ ■ ,Z p directly. The covariance matrix of these test statistics is the 
sample correlation matrix of Xi, ■ ■ ■ , X p . 

The effectiveness of our method is examined in several aspects. We first examine 
the goodness of approximation in Theorem 1 by comparing the marginal distributions 
and variances. We then compare the accuracy of FDP estimates with other methods. 
Finally, we demonstrate the improvement of the power with dependence adjustment. 

Distributions of FDP and its approximation: Without loss of generality, we 
consider a dependence structure based on the two factor model above. Let n = 100, 
Pi = 10 and a = 2. Let p vary from 100 to 1000 and t be either 0.01 or 0.005. The 
distributions of FDP(i) and its approximated expression in Theorem 1 are plotted in 
Figure [TJ The convergence of the distributions are self-evidenced. Table 2 summarizes 
the total variation distance between the two distributions. 

Variance of V(t): Variance of false discoveries in the correlated test statistics 
is usually large compared with that of the independent case which is pot(l — t), 
due to correlation structures. Thus the ratio of variance of false discoveries in the 
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p=100 
t=0.01 



p=100 
t=0.005 



p=100 
t=0.005 



p=500 
t=0.01 



p=500 
t=0.01 



p=500 
t-0.005 



p=500 
t=0.005 



0.0 0.2 0.4 0.6 0.8 1.0 



p=1000 
t=0.01 



p=1000 
t=0.01 



p=1000 
t=0.005 



p=1000 
t=0.005 



Figure 1: Comparison for the distribution of the FDP with that of its approximated 
expression, based on the two factor model over 10000 simulations. From the top row 
to the bottom, p = 100, 500, 1000 respectively. The first two columns correspond to 
t = 0.01 and the last two correspond to t — 0.005. 

Table 2: Total variation distance between the distribution of FDP and the limiting 
distribution of FDP in Figure 1. The total variation distance is calculated based on 
"TotalVarDist" function with "smooth" option in R software. 





p = 100 


p = 500 


p = 1000 


t = 0.01 


0.6668 


0.1455 


0.0679 


t = 0.005 


0.6906 


0.2792 


0.1862 



dependent case to that in the independent test statistics can be considered as a 
measure of correlation effect. See Owen (2005). Estimating the variance of false 



discoveries is an interesting problem. With approximation (16), this can easily be 
computed. In Table |3j we compare the true variance of the number of false discoveries, 



the variance of expression (18) (which is infeasible in practice) and the variance of 



expression (17) under 6 different dependence structures. It shows that the variance 



computed based on expression (17) approximately equals the variance of number of 
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false discoveries. Therefore, we provide a fast and alternative method to estimate 
the variance of number of false discoveries in addition to the results in Owen (2005). 
Note that the variance for independent case is merely less than 2. The impact of 
dependence is very substantial. 



Table 3: Comparison for variance of number of false discoveries (column 2), variance of 
expression (18) (column 3) and variance of expression (17) (column 4) with t = 0.001 



based on 10000 simulations. 



Dependence Structure 


var(y(*)) 


var(V^) 


va.iiV.up) 


Equal correlation 


180.9673 


178.5939 


180.6155 


Fan & Song's model 


5.2487 


5.2032 


5.2461 


Independent Cauchy 


9.0846 


8.8182 


8.9316 


Three factor model 


81.1915 


81.9373 


83.0818 


Two factor model 


53.9515 


53.6883 


54.0297 


Nonlinear factor model 


48.3414 


48.7013 


49.1645 



Table 4: Comparison of FDP values for our method based on equation (29) without 
taking expectation (PFA) with Storey's procedure and Benjamini-Hochberg's proce- 
dure under six different dependence structures, where p = 2000, n = 200, t = 0.001, 
and ^ = 1 for % e {false null}. The computation is based on 10000 simulations. The 
means of FDP are listed with the standard deviations in the brackets. 





True FDP 


PFA 


Storey 


B-H 


Equal correlation 


6.67% 


6.61% 


2.99% 


3.90% 




(15.87%) 


(15.88%) 


(10.53%) 


(14.58%) 


Fan & Song's model 


14.85% 


14.85% 


13.27% 


14.46% 




(11.76%) 


(11.58%) 


(11.21%) 


(13.46%) 


Independent Cauchy 


13.85% 


13.62% 


11.48% 


13.21% 




(13.60%) 


(13.15%) 


(12.39%) 


(15.40%) 


Three factor model 


8.08% 


8.29% 


4.00% 


5.46% 




(16.31%) 


(16.39%) 


(11.10%) 


(16.10%) 


Two factor model 


8.62% 


8.50% 


4.70% 


5.87% 




(16.44%) 


(16.27%) 


(11.97%) 


(16.55%) 


Nonlinear factor model 


6.63% 


6.81% 


3.20% 


4.19% 




(15.56%) 


(15.94%) 


(10.91%) 


(15.31%) 



Comparing methods of estimating FDP: Under different dependence struc- 



tures, we compare FDP values using our procedure PFA in equation ( 29 ) without tak 
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ing expectation and withpi known, Storey's procedure withpi known ((l—pi)t/R(t)) 
and Benjamini-Hochberg procedure. Note that Benjamini-Hochberg procedure is a 
FDR control procedure rather than a FDP estimating procedure. The Benjamini- 
Hochberg FDP is obtained by using the mean of "True FDP" in Table [4] as the 
control rate in B-H procedure. Table [4] shows that our method performs much better 
than Storey's procedure and Benjamini-Hochberg procedure, especially under strong 
dependence structures (rows 1, 4, 5, and 6), in terms of both mean and variance of 
the distribution of FDP. Recall that the expected value of FDP is the FDR. Table 3 
also compares the FDR of three procedures by looking at the averages. Note that the 
actual FDR from B-H procedure under dependence is much smaller than the control 
rate, which suggests that B-H procedure can be quite conservative under dependence. 



Equal Correlation 



Fan & Song's Model 



Independent Cauchy 




0.0 0.1 0.2 0.3 0.4 0.5 
False Discovery Proportion 



Three Factor Model 




0.0 0.1 0.2 0.3 0.4 

False Discovery Proportion 



Two Factor Model 

















7w WW WW? VW v 



0.1 0.2 0.3 



0.0 0.1 0.2 0.3 0.4 0.5 0.6 

False Discovery Proportion 



Nonlinear Factor Model 




False Discovery Proportion 



False Discovery Proportion 



False Discovery Proportion 



Figure 2: Comparison of true values of False Discovery Proportion with estimated 
FDP by Efron (2007) 's procedure (grey crosses) and our PEA method (black dots) 
under six different dependence structures, with p = 1000, pi = 50, n = 100, a = 2, t = 
0.005 and f3i = 1 for i e {false null} based on 1000 simulations. The Z-statistics with 
absolute value less than or equal to x = 1 are used to estimate the dispersion variate 
A in Efron (2007) 's estimator. The unconditional estimate of FDR(t) is pot/R(t) 
shown as green triangles. 
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Figure 3: Histograms of the relative error (RE) between true values of FDP and 
estimated FDP by our PFA method under the six dependence structures in Figure [2j 
RE is defined as (FDP(t) - FDP (t)) /FDP (t) if FDP(t) ^ and otherwise. 



Comparison with Efron's Methods: We now compare the estimated values 



of our method PFA (20) and Efron (2007) 's estimator with true values of false dis- 
covery proportion, under 6 different dependence structures. Efron (2007)'s estimator 
is developed for estimating FDP under unknown S. In our simulation study, we have 
used a known E for Efron's estimator for fair comparisons. The results are depicted 
in Figure |2j Figure [3] and Table [5} Figure [2] shows that our estimated values correctly 
track the trends of FDP with smaller amount of noise. It also shows that both our 
estimator and Efron's estimator tend to overestimate the true FDP, since FDPa(£) 
is an upper bound of the true FDP(t). They are close only when the number of false 
nulls pi is very small. In the current simulation setting, we choose p\ = 50 compared 
with p = 1000, therefore, it is not a very sparse case. However, even under this 
case, our estimator still performs very well for six different dependence structures. 
Efron (2007) 's estimator is illustrated in Figure [2] with his suggestions for estimating 
parameters, which captures the general trend of true FDP but with large amount of 
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noise. Figure [3] shows that the relative errors of PFA concentrate around 0, which 
suggests good accuracy of our method in estimating FDP. Table [5] summarizes the 
relative errors of the two methods. 



Table 5: Means and standard deviations of the relative error between true values 
of FDP and estimated FDP under the six dependence structures in Figure |2j REp 
and REe are the relative errors of our PFA estimator and Efron (2007)'s estimator, 





REp 


REe 


mean SD 


mean SD 


Equal correlation 
Fan & Song's model 
Independent Cauchy 
Three factor model 
Two factor model 
Nonlinear factor model 


0.0241 0.1262 
0.0689 0.1939 
0.0594 0.1736 
0.0421 0.1657 
0.0397 0.1323 
0.0433 0.1648 


1.4841 3.6736 
1.2521 1.9632 
1.3066 2.1864 
1.4504 2.6937 
1.1227 2.0912 
1.3134 4.0254 



Dependence- Adjusted Procedure: We compare the dependence-adjusted pro- 
cedure described in section 3.4 with the testing procedure based only on the observed 
test statistics without using correlation information. The latter is to compare the 
original z-statistics with a fixed threshold value and is labeled as "fixed threshold 
procedure" in Table [6] With the same FDR level, a procedure with smaller false 
nondiscovery rate (FNR) is more powerful, where FNR = E[T/(p — R)] using the 
notation in Table 1. 

In Table [6j without loss of generality, for each dependence structure we fix thresh- 



old value 0.001 and reject the hypotheses when the dependence-adjusted p- values (36) 
is smaller than 0.001. Then we find the corresponding threshold value for the fixed 
threshold procedure such that the FDR in the two testing procedures are approxi- 
mately the same. The FNR for the dependence- adjusted procedure is much smaller 
than that of the fixed threshold procedure, which suggests that dependence-adjusted 
procedure is more powerful. Note that in Table |6j p\ = 200 compared with p = 1000, 
implying that the better performance of the dependence-adjusted procedure is not 
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Table 6: Comparison of Dependence-Adjusted Procedure with Fixed Threshold Pro- 
cedure under six different dependence structures, where p = 1000, n = 100, a — 1, 
Pi = 200, nonzero Pi simulated from U(0, 1) and k = n — 3 over 1000 simulations. 





Fixed Threshold Procedure 


Dependence- Adjusted Procedure 




FDR 


FNR 


Threshold 


FDR 


FNR 


Threshold 


Equal correlation 


17.06% 


4.82% 


0.06 


17.34% 


0.35% 


0.001 


Fan & Song's model 


6.69% 


6.32% 


0.0145 


6.73% 


1.20% 


0.001 


Independent Cauchy 


7.12% 


0.45% 


0.019 


7.12% 


0.13% 


0.001 


Three factor model 


5.46% 


3.97% 


0.014 


5.53% 


0.31% 


0.001 


Two factor model 


5.00% 


4.60% 


0.012 


5.05% 


0.39% 


0.001 


Nonlinear factor model 


6.42% 


3.73% 


0.019 


6.38% 


0.68% 


0.001 



limited to sparse situation. This is expected since subtracting common factors out 
make the problem have a higher signal to noise ratio. 



6 Real Data Analysis 

Our proposed multiple testing procedures are now applied to the genome-wide asso- 
ciation studies, in particular the expression quantitative trait locus (eQTL) mapping. 
It is known that the expression levels of gene CCT8 are highly related to Down Syn- 
drome phenotypes. In our analysis, we use over two million SNP genotype data and 
CCT8 gene expression data for 210 individuals from three different populations, test- 
ing which SNPs are associated with the variation in CCT8 expression levels. The 
SNP data are from the International HapMap project, which include 45 Japanese 
in Tokyo, Japan (JPT), 45 Han Chinese in Beijing, China (CHB), 60 Utah resi- 
dents with ancestry from northern and western Europe (CEU) and 60 Yoruba in 
Ibadan, Nigeria (YRI). The Japanese and Chinese population are further grouped 
together to form the Asian population (JPTCHB). To save space, we omit the de- 
scription of the data pre-processing procedures. Interested readers can find more de- 



tails from the websites: http://pngu.mgh.harvard.edu/~purcell/plink/res.shtml and 
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ftp://ftp.sanger.ac.uk/pub/genevar/, and the paper Bradic, Fan & Wang (2010). 



We further introduce two sets of dummy variables (di, d 2 ) to recode the SNP data, 
where di = (di t i, • • • ,di jP ) and d 2 = (c? 2 ,i,--- ,d 2 ^ p ), representing three categories 
of polymorphisms, namely, (dij,d 2 j) = (0,0) for SNP.,- = (no polymorphism), 
(dij, d 2 j) = (1,0) for SNPj = 1 (one nucleotide has polymorphism) and (dij,d 2 j) = 
(0,1) for SNPj = 2 (both nucleotides have polymorphisms). Let {y*}™ =1 be the 
independent sample random variables of Y, {d\j}™ , =1 and {d 2 ^ =1 be the sample 
values of di j and d 2 j respectively. Thus, instead of using model ([I]), we consider two 
marginal linear regression models between {y*}™ =1 and {d\j}^ =1 : 



mm 



1 n 

- ^2 — a i,j — Pi,jd[ j) 2 , j = l,---,p (30) 



i=i 



and between {F*}" =1 and {d l 2 7 }" =1 : 



1 " 



mm 



^E{Y l -a 2tj -^d^)\ j = !,•••, p. (31) 



i=i 

For ease of notation, we denote the recoded n x 2p dimensional design matrix as X. 
The missing SNP measurement are imputed as and the redundant SNP data are 
excluded. Finally, the logarithm-transform of the raw CCT8 gene expression data are 
used. The details of our testing procedures are summarized as follows. 

• To begin with, consider the full model Y = a + X/3 + e, where Y is the CCT8 
gene expression data, X is the n x 2p dimensional design matrix of the SNP 
codings and ~ N(0, a 2 ), i = 1, • • • , n are the independent random errors. We 
adopt the refitted cross-validation (RCV) (Fan, Guo & Hao 2010) technique to 
estimate a by a, where LASSO is used in the first (variable selection) stage. 



Fit the marginal linear models (30 ) and (31 ) for each (recoded) SNP and obtain 
the least-squares estimate (3j for j = 1, • • - , 2p. Compute the values of Z- 
statistics using formula ([3]), except that a is replaced by a. 
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Calculate the P-values based on the Z-statistics and compute R(t) = #{Pj : 
Pj < t} for a fixed threshold t. 

Apply eigenvalue decomposition to the population covariance matrix X of the Z- 
statistics. By Proposition 1, £ is the sample correlation matrix of c?2,i, ■ ■ ■ , 
di, p , c?2,p) T - Determine an appropriate number of factors k and derive the cor- 
responding factor loading coefficients {b ih y~^ p, j l l ~^ . 

Order the absolute-valued Z-statistics and choose the first m = 95% x 2p of 



them. Apply Li-regression to the equation set (21) and obtain its solution 



Wi, • • • , W k . Plug them into (20) and get the estimated FDP(t). 



For each intermediate step of the above procedure, the outcomes are summarized in 
the following figures. Figure [4] illustrates the trend of the RCV-estimated standard 
deviation a with respect to different model sizes. Our result is similar to that in 
Fan, Guo & Hao (2010), in that although a is influenced by the selected model size, 
it is relatively stable and thus provides reasonable accuracy. The empirical distri- 
butions of the Z-values are presented in Figure |5j together with the fitted normal 
density curves. As pointed out in Efron (2007, 2010), due to the existence of de- 
pendency among the Z-values, their densities are either narrowed or widened and 
are not N(0, 1) distributed. The histograms of the P-values are further provided in 
Figure [6j giving a crude estimate of the proportion of the false nulls for each of the 
three populations. The main results of our analysis are presented in Figures [7j 
which depicts the number of total discoveries R(t), the estimated number of false 
discoveries V(t) and the estimated False Discovery Proportion FDP(t) as functions 
of (the minus log 10 -transformed) thresholding t for the three populations. As can be 
seen, in each case both R(t) and V(t) are decreasing when t decreases, but FDP(t) 
exhibits zigzag patterns and does not always decrease along with t, which results from 
the cluster effect of the P-values. A closer study of the outputs further shows that for 
all populations, the estimated FDP has a general trend of decreasing to the limit of 
around 0.1 to 0.2, which backs up the intuition that a large proportion of the smallest 
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Figure 4: a of the three populations with respect to the selected model sizes, derived 
by using refitted cross-validation (RCV). 



CEU 




JPTCHB 




YRI 




Figure 5: Empirical distributions and fitted normal density curves of the Z- values 
for each of the three populations. Because of dependency, the Z-values are no longer 
N(0, 1) distributed. The empirical distributions, instead, are iV(0.12, 1.22 2 ) for CEU, 
iV(0.27, 1.39 2 ) for JPT and CHB, and JV(-0.04, 1.66 2 ) for YRI, respectively. The 
density curve for CEU is closest to N(0, 1) and the least dispersed among the three. 



P-values should correspond to the false nulls (true discoveries) when Z-statistics is 
very large; however, in most other thresholding values, the estimated FDPs £1X6 dbt> 8b 
high level. This is possibly due to small signal-to-noise ratios in eQTL studies. 

The results of the selected SNPs, together with the estimated FDPs, are depicted 
in Table [7j It is worth mentioning that Deutsch et al. (2005) and Bradic, Fan & Wang 
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Figure 6: Histograms of the P-values for each of the three populations. 

(2010) had also worked on the same CCT8 data to identify the significant SNPs in 
CEU population. Deutsch et al. (2005) performed association analysis for each SNP 
using ANOVA, while Bradic, Fan & Wang (2010) proposed the penalized composite 
quasi-likelihood variable selection method. Their findings were different as well, for 
the first group identified four SNPs (exactly the same as ours) which have the smallest 
P-values but the second group only discovered one SNP rs965951 among those four, 
arguing that the other three SNPs make little additional contributions conditioning 
on the presence of rs965951. Our results for CEU population coincide with that of 
the latter group, in the sense that the false discovery rate is high in our findings and 
our association study is marginal rather than joint modeling among several SNPs. 

Table 7: Information of the selected SNPs and the associated FDP for a particular 
threshold. Note that the density curve of the Z-values for CEU population is close 
to A r (0, 1), so the approximate FDP(t) equals pt/R(t) 0.631. Therefore our high 
estimated FDP is reasonable. 



Population 


Threshold 


# Discoveries 


Estimated FDP 


Selected SNPs 


JPTCHB 


1.61 x 10~ 9 


5 


0.1535 


rs965951 rs2070611 
rs2832159 rs8133819 
rs2832160 


YRI 


1.14 x 10" 9 


2 


0.2215 


rs9985076 rs965951 


CEU 


6.38 x 10~ 4 


4 


0.8099 


rs965951 rs2832159 
rs8133819 rs2832160 
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Figure 7: Number of total discoveries, estimated number of false discoveries and esti- 
mated False Discovery Proportion as functions of thresholding t for CEU population 
(row 1), JPT and CHB (row 2) and YRI (row 3). The re-coordinate is — logt, the 
minus log 10 -transformed thresholding. 
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Table 8: Information of the selected SNPs for a particular threshold based on the 
dependence-adjusted procedure. The number of factors k in (36) equals 10. The 
estimated FDP is based on estimator (20) by applying PFA to the adjusted Z- values. 
* is the indicator for SNP equal to 2 and otherwise is the indicator for 1. 



Population 


Threshold 


# Discoveries 


Estimated FDP 


Selected SNPs 


JPTCHB 


2.89 x 10" 4 


5 


0.1205 


rs965951 rs2070611 
rs2832159 rs8133819 
rs2832160 


YRI 


8.03 x 10" 5 


4 


0.2080 


rs7283791 rsl 1910981 
rs8128844 rs965951 


CEU 


5.16 x 10" 2 


6 


0.2501 


rs464144* rs4817271 
rs2832195 rs2831528* 
rsl571671* rs6516819* 



Table [8] lists the SNP selection based on the dependence-adjusted procedure. For 
JPTCHB, with slightly smaller estimated FDP, the dependence-adjusted procedure 
selects the same SNPs with the group selected by the fixed-threshold procedure, 
which suggests that these 5 SNPs are significantly associated with the variation in 
gene CCT8 expression levels. For YRI, rs965951 is selected by the both procedures, 
but the dependence-adjusted procedure selects other three SNPs which do not appear 
in Table [7} For CEU, the selections based on the two procedures are quite different. 
However, since the estimated FDP for CEU is much smaller in Tableland the signal- 
to-noise ratio of the test statistics is higher from the dependence-adjusted procedure, 
the selection group in Table [8] seems more reliable. 



7 Discussion 



We have proposed a new method (principal factor approximation) for high dimen- 
sional multiple testing where the test statistics have an arbitrary dependence struc- 
ture. For multivariate normal test statistics with a known covariance matrix, we can 
express the test statistics as an approximate factor model with weakly dependent ran- 
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dom errors, by applying spectral decomposition to the covariance matrix. We then 
obtain an explicit expression for the false discovery proportion in large scale simul- 
taneous tests. This result has important applications in controlling FDP and FDR. 
We also provide a procedure to estimate the realized FDP, which, in our simulation 
studies, correctly tracks the trend of FDP with smaller amount of noise. 

To take into account of the dependence structure in the test statistics, we propose 
a dependence- adjusted procedure with different threshold values for magnitude of Zi 
in different hypotheses. This procedure has been shown in simulation studies to be 
more powerful than the fixed threshold procedure. An interesting research question 
is how to take advantage of the dependence structure such that the testing procedure 
is more powerful or even optimal under arbitrary dependence structures. 

While our procedure is based on a known correlation matrix, we would expect 
that it can be adapted to the case with estimated covariance matrix. The question 
is then how accuracy the covariance matrix should be so that a simple substitution 
procedure will give an accurate estimate of FDP. 

We provide a simple method to estimate the realized principal factors. A more 
accurate method is probably the use of penalized least-squares method to explore the 
sparsity and to estimate the realized principle factor. 

8 Appendix 

Lemma 1 is fundamental to our proof of Theorem 1 and Proposition 2. The result is 
known in probability, but has the formal statement and proof in Lyons (1988). 

Lemma 1 (Strong Law of Large Numbers for Weakly Correlated Variables). Let 

{X n }™ ! be a sequence of real-valued random variables such that E\X n \ 2 < 1. If 
\X n \<la.s. andY jN >ijjE\^Y, n <N X n\ 2 < °°> then lim ^oo j, E„<iv X n = a.s.. 
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Proof of Proposition 2: Note that Pi = 2<&(—\Zi\). Based on the expression of 
{Zxi ■ ■ ■ , Z p ) T in (10), |/(-Pj < t\Wx, ■ ■ ■ , Wfc) j are dependent random variables. 
Nevertheless, we want to prove 

v 

p- 1 Y^[I{Pi < t\ W u ■ ■ ■ , W k ) - P{Pi < t\ W u ■ ■ ■ , W k )\ P ^ a.s.. (32) 

i=l 

Letting X { = I(P; < t\W x , • • • , W k ) - P(Pi < t\W 1} • • • , W k ), by Lemma 1 the con- 



clusion (32) is correct if we can show 



p 

Var (V 1 t( p i < t\Wi, ■ ■ ■ , W k )J = O p (p- & ) for some 5 > 0. 

i=l 

To begin with, note that 

p 

i=i 

P 

= p- 2 ^Vai(l(Pi<t\Wi,--- ,W k )) 
i=i 

+2p- 2 Cov ( J (^ <*|WV" .Wfc),/^ <t|WV 



Since Varf J(Pj < £|Wi, ■ ■ ■ , W k )) < |, the first term in the right-hand side of the last 
equation is O p (p~ 1 ). For the second term, the covariance is given by 

P{Pi < t, P 3 < t\ W h ■ ■ ■ , W k ) - P(P l < t\ W u ■ ■ ■ , W k )P(P J < t\W x , ■ ■ ■ , W k ) 
= P{\Zi\ < -$- 1 (t/2),|Z J | < -<Z>~ l (t/2)\W h --- ,W k ) 

-PQZil < -$-\t/2)\W 1 , ■■■ , W k )P{\Z,\ < -$-\t/2)\W 1 , ■ ■ ■ , W k ) 

To simplify the notation, let p\, be the correlation between Ki and Kj. Without loss 
of generality, we assume > (for p*- < 0, the calculation is similar). Denote by 

Cl,i = Cli(-Zt/2 -T)i- fa), C 2 ,i = Oi(z t /2 -T)i- fa)- 
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Then, from the joint normality, it can be shown that 



P{\Zi\ < -$- i (t/2),|Z,| < -$- i (t/2)|^ 1 ,-- - ,W k ) 
P(c2,i/ai < Ki< ci ii /a i ,c 2 j/a j < Kj < c hj /a,j) 



x 



(1-P*) 1/2 

(l-p*) 1/2 



- $ 
- $ 



:i-p5) i/2 

(p&) 1/2 ^ + c 2 

(i-p6) 1/2 



(33) 



(z)dz. 



Next we will use Taylor expansion to analyze the joint probability further. We 
have shown that (K 1: ■ ■ ■ , if p ) T ~ -A/"(0, A) are weakly dependent random variables. 
Let cov^ denote the covariance of Ki and Kj, which is the (i,j)ih element of the 



covariance matrix A. We also let b^ = (1 — ^2 h=1 b 2 h Y^ 2 (l — Y%=i^ 
Holder inequality, 



^) 1/2 - 



By the 



p- 2 J2\cov*\^<p-^(J2\covU 2 ) 1/4 = 

i,j=l i,j=l 



p- 2 ( E ^ 

i=k+l 



/2 



1/4 



->■ 



as p — > oo. For each $(•), we apply Taylor expansion with respect to (covf ) 1 / 2 , 



./ (p&) 1/2 ^ + Ci,A 



= $ 



cov^z + ib^ci, 



(&£■ - ctwj-) 1/2 / 
= $(c lii ) + ^(c 1>i )(6j-)- 1/2 ^(«wS) 1/2 

+^(ci,i)ci >i (6f 1 -)- 1 (l - ^ 2 )co4 + i2(«n;*) 



where R(covfj) is the Lagrange residual term in the Taylor's expansion, and R(covfj) = 
f(z)0(\cov^\ 3 ^ 2 ) in which /(z) is a polynomial function of z with the highest order 
as 6. 
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Therefore, we have (33) equals 



$(c M ) - $(c 2)i )J [$( Clj ) - $(c 2j ) 

+ (0(c M ) - 0(c 2 ,)) (0( Clj ) - 0(c 2j )) ih%)~\ov% + Odco^l 3 / 2 ) 



where we have used the fact that J^zip^dz = 0, (1 — z 2 )(f)(z)dz = and 
the finite moments of standard normal distribution are finite. Now since P(\Zi\ < 
-$-\t/2)\W u • • • , W k ) = $(c M ) - $(c 2jl ), we have 



Cov(/(P 4 < t\W x , - ■ ■ , W k ), I(Pj < t\W lt ■ ■ ■ , W k ) 
0(ci,i) - 0(c 2 ,i)J (^(cij) - (p^j^aiajcovfj + 0(|cot»{j| 3/2 ). 



In the last line, {d>(c\^) — 0(c 2j j)) (0(cij) — 0(c 2j ))ajaj is bounded by some constant 
except on a countable collection of measure zero sets. Let Ci be defined as the set 
{zt/2 + Vi + l^i = 0} u - Vi - = °}- On the set C?, (</>(c Xii ) - 0(c 2)i ))aj converges 
to zero as — > oo. Therefore, (0(ci 5 j) — 0(c 2) j)) (</>(ci i: ,) — </>(c 2 j))aja.,- is bounded by 
some constant on (ljf =1 Cj) c . 

By the Cauchy-Schwartz inequality and (CO) in Theorem 1, p~ 2 ■ \cov\^ 
0(p~ 5 ). Also we have \covfA 3 / 2 < 



cof^ |. On the set (IJf=i Cj) c , we conclude that 



Var^- 1 ^^; <t\Wi,--- ,W k yj =O p (p- 5 ) 
i=i 



Hence by Lemma 1, for fixed (wi, ■ ■ ■ ,w k ) 



1=1 



j = wi, ■ ■ • , W k = w k )-P(P l < t\W t = w u ■ ■ ■ , W k = w k )} P ^ a.s. 

(34) 

If we define the probability space on which (Wi, ■ ■ ■ , W k ) and (Ki, • ■ ■ , K p ) are con- 
structed as in (10) to be (f^J 7 , z/), with J 7 and v the associated a— algebra and 
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(Lebesgue) measure, then in a more formal way, (34) is equivalent to 



p 

p- 1 {l{Pi(«>) < t\Wt = w u ■ ■ ■ , W k = w k )-P{Pi < t\W! = Wl , ■ ■ ■ , W k = w k )} p ^ 
i=i 

for each fixed (w\, ■ ■ ■ ,w k ) T and almost every uj e fl, leading further to 

p- 1 J2 { < *) - P(Pi < t\Wi(u), ■ ■ ■ , w k (uj))} ^ o 

i=l 

for almost every wGfi, which is the definition for 

p 

p- 1 ]T {I(P <t)- P{P <t\W u --- , W k )} ^ a.s.. 
i=i 

Therefore, 

p 

lim p -1 |l(P < t) - [$(ai{zt/2 + + fH)) + ^(0i(«t/2 - 77. - #))] } = a.s.. 
«=i 

With the same argument we can show 

limpo 1 !^^)- Yl [®( a i( z t/2 + Vi)) + ${ai(zt/2 -%))]} = a.s. 

iS{true null} 

for the high dimensional sparse case. The proof of Proposition 2 is now complete. 
Proof of Theorem 1: 

For ease of notation, denote Y^=i [®( a i( z t/2 + Vi + + &( a i( z t/2 —Vi ~ A 4 *))] as -^(0 
and Eie{tmc null} [$M*t/2 + Vi)) + ^(oi(«t/a - 7/j))] as K(t), then 
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lim (FDP(t) - ^ e{truc Dull} [ $ ( fli ^*/ 2 + ^ + $ ( a «(^/2 ~ ^))] 
l ELi [Hai(zt/2 + + + Hai(zt/2 -Vi~ Hi))] 

l R(t) R(t) i 

(V(t)/p )[(R(t) - R{t))/p\ + (R(t)/p)[(V(t) - V(t))/ Po ] 



= lim 
= a.s. 



R(t)R(t)/( PoP ) 



by the results in Proposition 2 and the fact that both p and p 1 R(t) are 
bounded random variables. The proof of Theorem 1 is complete. 

Proof of Theorem 2: Letting 



A 1 = [®(ai(zt/2 + bfw)) - $(a,(z t/2 + bfw)) 
A 2 = [ $ ( a *(^/2 - b fw)) - $(a^ t/2 - bfw)) J , 



and 



we have 



FDP(t) - FDP A (t) = (A x + A 2 )/R(t). 



Consider Ai = Y7i=i ^u- By the mean value theorem, there exists ^ in the interval of 
(bfw, bfw), such that An = 4>(ai(z t /2 + &))aibf(w — w) where (/)(■) is the standard 
normal density function. 

Next we will show that <f>(ai(zt/2 + £i)) a i is bounded by a constant. Without 
loss of generality, we discuss about the case in (C2) when z t / 2 + bfw < — t. By 
(C3), we can choose sufficiently large p such that z t / 2 + & < — r/2. For the function 
g(a) = exp(— a 2 x 2 /8)a, g(a) is maximized when a = 2/x. Therefore, 



2iT(j)(ai(z t / 2 +^))oj < a i exp(-a i r /8) < 2exp(-l/2)/r. 



42 



For z t /2 + bf w > t we have the same result. In both cases, we can use a constant D 
such that (f>(ai(zt/2 + Ci)) a i < D. 

By the Cauchy-Schwartz inequality, we have Y7i=i \hh\ < ip Y7%=i HhY^ = (P^h) 1 ^- 
Therefore, by the Cauchy-Schwartz inequality and the fact that Ylh=i ^ h < Pi we have 

p k 

I Ai| < [J^ \hh\\wh ~ w h \ 

i=l h=l 
k 

< DY,(P^h) 1/2 \w h -w h \ 

h=i 

< iVp(X^X^-<> 2 ) 

h=l 7i=l 

< Dp||w — w|| 2 . 

By (CI) in Theorem 2, R(t)/p > H for H > when p ->■ oo. Therefore, |Ai/i2(t)| = 
0(||w— w|| 2 ). For A 2 , the result is the same. The proof of Theorem 2 is now complete. 

Proof of Theorem 3: Without loss of generality, we assume that the true value of 
w is zero, and we need to prove ||w|| 2 = O p (yJ~^). Let L : R k — > R k be defined by 

m 

Lj(w) = m~ x ^2 bijSgn(Ki - bjw) 

i=l 

where sgn(x) is the sign function of x and equals zero when x — 0. Then we want to 
prove that there is a root w of the equation L(w) = satisfying || w||| = O p (k/m). By 
classical convexity argument, it suffices to show that with high probability, w T L(w) < 
with || w || 2 = Bk/m for a sufficiently large constant B. 

Let V = w T L(w) = m- 1 YZi V ii where V i = (bJw)sgn(Ki - bfw). By Cheby- 
shev's inequality, P(V < E{V) + h x SD(V)) > 1 - hr 2 . Therefore, to prove the 
result in Theorem 3, we want to derive the upper bounds for E(V) and SD(V) and 
show that Mh > 0, 3B and M s.t. Vm > M, P(V < 0) > 1 - h~ 2 . 
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We will first present a result from Polya (1945), which will be very useful for our 
proof. For x > 0, 



1 + \j 1 — exp( x 2 

7T 



(l + 5(x)) with sup|<J(x)| < 0.004. (35) 

x>0 



The variance of V is shown as follows: 



Var(F) = m~ 2 ^ Var(^) + mT 2 Cov(^, l£ 



i=l 



Write w = su with ||u|| 2 = 1 where s = (Bk/m) 1 / 2 . By (C5), (C6) and (C7) in 
Theorem 3, for sufficiently large m, 

mm m 

Var(Vi) = ^/(|bfu| < d)Var(^) + ^/(|bfu| > d)Var(^) 

8=1 1=1 
m 

= [^/(|bfu|>d)Var(^)](l + o(l)), (36) 



t=i 



and 



^Tcov^vv 



]T/(|bfu| < d)J(|bJu| < d)Cov(V t ,V 3 : 



+2^/(|bfu| < d)l(\bju\ > <£)Cov(Vi,Vj] 



+ ^/(|bfu| > d)I{\bju\ > d)Co-v(Vi,Vj 



[^/(|bfu| ><f)/(|bju| >d)Cov(^,y i )](l + o(l)). (37) 

#9 



We will prove (36) and (37) in detail at the end of proof for Theorem 3. 



For each pair of and Vj, it is easy to show that 



Cov(K-, V,-) = 4(bfw)(b T w) P{Ki < bJw,Kj < b T w) - $(a i bfw)$(a i b T w) 
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The above formula includes the Var(Vi) as a specific case. 



By Polya's approximation (35), 



Var(^) = (bfw) 2 exp j - ^(aibfw) 2 j(l + 6,) with |^| < 0.004. 



Hence 



7ft 7/t ^ 

^/(|bfu| > d)Vax(K) < £Vexp{ - -(a 4 rfs) 2 } (1 + ^) 

i=l »=1 71 " 

< 2ms 2 exp| (amines) 2 1. 

To compute Cov (Vi,Vj), we have 

P(Ki < bj w, Kj < bjw) 

(I4|)V^ + aibfwv /^(IpII) 1 ^ + a,bjw 



<&(<2jbf w)$(ajbjw) + 0(ajbfw)0(ajbj 1 w)aia :/ cof^(l + o(l)), 



where 5^- = 1 if > and —1 otherwise. Therefore, 

Cov(V5, Vj) = 4:(bJw)(bJw)(j)( y a i bJw)(f)( y a j hJw)a i a j cov^(l + o(l)), 

and 



^/(|bfu| > d)/(|bju| > d)Cov(V;,^)| 



#3 

Consequently, we have 



< ^ S% eX P { ~ («min^) 2 }«Lx|co4|(l + o(l)). 



2 r 2 i r 1 -v -v 
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We apply (C4) in Theorem 3 and the Cauchy-Schwartz inequality to get ^ Yli ■ cov^ < 
(52i=k+i ^i) 1 ^ 2 — anc ^ concm de that the standard deviation of V is bounded by 



V2sm 1/2 exp j - ^(a min ds) 2 }amax(?7) 1/4 - 



In the derivations above, we used the fact that b„- u < HbJU < 1, and the covariance 



matrix for Ki in (21) of the paper is a submatrix for covariance matrix of K, t in (10). 

Next we will show that E(V) is bounded from above by a negative constant. Using 
x($(x) — |) > 0, we have 



-E{V) 



m 



i=l 



T 

; W 



$(aib;w) 



ll 



2ds 



> ^£l{\bTu\>d)[*(a i d8)-l 
i=i 

^/(|bfu| <d) h( ai ds) 1 



m 

i=i 



i=l 



By (C5) in Theorem 3, — X^=i ^(l^i u l — d) ~ 0' so f° r sufficiently large m, we have 



i=i 



An application of (35) to the right hand side of the last line leads to 



m 14 — ' 2 

i=i 



7T 



Note that 



1 — exp( x 

71 



oo j 

./'"' — ( •'•")' > — •'' 



7T " (Z + 1)! V 7T 

J=0 v 7 



7T ^ l\ 7T 7T 

Z=0 



x exp( X 

7T 
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so we have 



-E{V) > -s 



To show that V/i > 0, 3B and M s.t. Vm > M, P(V < 0) > 1 - h~ 2 , by 
Chebyshev's inequality and the upper bounds derived above, it is sufficient to show 
that 



—s 2 J^a min exp | - ^(a min <is) 2 } > hsm 1/2 exp j - ^(a min ds) 2 ja max 7? 1/4 



2 

Recall s = (Bk/m) 1 ^ 2 , after some algebra, this is equivalent to show 



d 2 (5A;) 1/2 (7r)- 1/2 exp{-^(a min rfs) 2 ) > 2h V 



1/4 ^max 



By (C6), then for all h > 0, when B satisfies d 2 ^) 1 / 2 ^)- 1 / 2 > 2hr]^ 4 S, we have 
P(F < 0) > 1 - /i~ 2 . Note that k = 0{m K ) and 77 = 0(m 2K ), so A;" 1 / 2 ^ 1 / 4 = 0(1). 



To complete the proof of Theorem 3, we only need to show that (36) and (37) are 
correct. 



To prove (36), by (38) we have 



^/(|bfu| < d)Var(V-) < ^/(|bfu| < d)s 2 d 2 



i=l i=l 



and 

mm „ 

^/(|bfu| > d)Var(Vi) > ^/(|bfu| > d)s 2 d 2 exp f " 

i=l i=l 



_ 2 2 
7T 



Recall s = (Bk/m) 1 / 2 , then by (C6) and (C7), exp jfa^s 2 } = 0(1). Therefore, by 
(C5) we have 

Er=iA|bfu|<rf)Var(^) 



E^i^lbful >d)Var(y i ) 



—7-0 as m — > 00, 



so (36) is correct. With the same argument and by (39), we can show that (37) is 



also correct. The proof of Theorem 3 is now complete. 
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Proof of Theorem 4: Note that ||W LS - W* s || 2 = || (X T X)- 1 X T fj.\\ 2 . By the 
definition of X, we have X T X = A, where A = diag(Ai, ■ ■ • , A&). Therefore, by the 
Cauchy-Schwartz inequality, 

ii%s-w- s ii 2 ^[x:(^) 2 ] 1/2 <w 2 (i:}) 1/2 

i=i Ai i=i Ai 

The proof is complete. 
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